DSL-Infosys Coding Challenge

Why did the US dollar appreciate during the financial crisis?

Team Webb

Team Members

Mun Fai Chan

  • 2nd year BSc Economics
  • Regression analysis to explore the effect of various factors on exchange rates

Matthew Barnes

  • 3rd year BSc Economics
  • Investigated the role the US dollar played as a safe haven currency

Sanjit Neelam

  • 1st year BSc Financial Maths and Stats
  • Investigated relationship between echange rates and interest rates using time series techniques

Additional Resources Used

To form our hypotheses:

  • Variety of sources for data sets including World Bank, IMF, OECD, St. Louis Fed
  • These data sets included metrics such as GDP, FX reserves, interest rates and

To imporve our Python skills:

  • Articles and tutorials found through Google:
    • Use of seaborn and statsmodels packages for regressions
    • Using melt and merge functions in Pandas
    • Plotly interactive plots

Academic Skills

  • Tried to incorporate elements of Chapter 11, ST102 such as the residual plot (Sanjit)
  • Used econometric and macroeconomic knowledge to understand theorised and hypothesised factors behind USD appreciation, as well as to guide initial research (Matthew and Mun Fai)

Difficulties

  • Presenting using RISE, specifically scrolling on slides
    • Team mate resolved this for other members
  • Working remotely in different time zones
    • Use of Microsoft Teams and when2meet.com
  • Bugs in the code
    • Developing techniques to identify where exactly the bug was caused
  • Highly technical subject with literature using advanced regression techniques
    • Identifying what factors could be assessed using the skill set we had or could resonably acquire during the process

Organisation

  • Met as a team at university to learn about team member's skills, prior experience and subject knowledge
  • Delegated work based on this meeting
  • Used Microsoft Teams and GitHub

Hypothesis 1

The US dollar is a safe haven currency therefore it will appreciate during times of crisis, even when the crisis originated in the US.

This will be illustrated using the yields of US Treasury 10 year bonds. Yields should fall as demand for these bonds increases, increasing demand for the US dollars and pushing its value up.

In [1]:
# Import libraries 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import seaborn as sns
import scipy as sp
from sklearn.linear_model import LinearRegression
import statsmodels.formula.api as smf
from statsmodels.tsa.stattools import grangercausalitytests

from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

Loading and preparing exchange rate data

Data source: https://www.kaggle.com/brunotly/foreign-exchange-rates-per-dollar-20002019
Downloaded: 16th March 2020

In [2]:
# Read in data
fx = pd.read_csv("FX_rates.csv", usecols = list(range(1,24)))

# Replace missing data with NaN
fx = fx.replace('ND', np.nan)

# Convert dates into date format
fx['Time Serie'] = pd.to_datetime(fx['Time Serie'], format = '%Y-%m-%d')

# Create clean dataset with no missing data
fx = fx.dropna()

# Convert exchange rate columns from strings to floats
fx.iloc[:,1:23] = fx.iloc[:,1:23].apply(pd.to_numeric)

Lists of exchange rates, countries and colours for plot

In [3]:
## List of curreny codes
curr_code = fx.columns[1:]

# Extract list of countries from exchange rate names
country = []
for code in fx.columns[1:]:
    a = code.split(" -")
    country.append(a[0])

# List of colours for countries
color = ['goldenrod', 'royalblue', 'black', 'mediumblue', 'gold', 
         'olive', 'red', 'mediumvioletred', 'deepskyblue', 'purple', 
         'green', 'seagreen', 'lightgreen', 'slategrey', 'rosybrown', 
         'royalblue', 'darkkhaki', 'cornflowerblue', 'maroon', 'mediumpurple', 
         'turquoise', 'navy']

Indexed FX rates

In [4]:
# Indexed fx rates from 2007-01-02 to 2009-12-31
fx_index = fx.iloc[1759:2514,].copy()

# Calculate indexes
n = 0
for currency in curr_code :
    fx_index[curr_code[n]] = fx_index[curr_code[n]]/fx_index.iloc[0,1+n]*100
    n += 1

US Treasury Yields

Data source: https://fred.stlouisfed.org/series/DGS10
Downloaded: 18th March 2020

In [5]:
# Read data
us_yield = pd.read_csv('US_yield.csv', skiprows = 13)

# Convert data column to date format
us_yield['date'] = pd.to_datetime(us_yield['date'], format = '%Y-%m-%d')

# Yield data from 2007-01-02 to 2009-12-31
us_yield_crisis = us_yield.iloc[11231:11983,:].copy()

Interactive plot

This plot shows currencies against the US dollar from 2007-01-02 to 2009-12-31. The exchage rates are indexed at 2007-01-02.

In [6]:
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(go.Scatter(x = list(us_yield_crisis.date), 
                         y = list(us_yield_crisis[' value']), 
                         name = "<b>US 10 year Treasury yield</b>", 
                         line = dict(color = 'black')),
              secondary_y = True
)

n = 0
for i in curr_code:          
    fig.add_trace(go.Scatter(x = list(fx_index['Time Serie']), 
                             y = list(fx_index[curr_code[n]]), 
                             name = country[n], 
                             line = dict(color = color[n], 
                                         width = 1.5),
                             visible = 'legendonly'),
                  secondary_y = False,
    )
    n += 1

# Add title and change background colour
fig.update_layout(
    title_text = "US Treaury 10 year yield",
    plot_bgcolor = 'whitesmoke'
)


# Set y-axes titles
fig.update_yaxes(title_text = "Indexed $ exchange rate", 
                 secondary_y = False)
fig.update_yaxes(title_text = "US 10 year yield %", 
                 secondary_y = True)

fig.show()

Current account

Data source: https://data.worldbank.org/indicator/BN.CAB.XOKA.GD.ZS
Downloaded: 18th March 2020

In [7]:
# Load current account data
current_ac = pd.read_csv('Current_account.csv', 
                         skiprows = 4, 
                         usecols = [0] + list(range(44,63)))

# List of indexes for countries in fx data frame
country_index = [11, 66, 178, 79, 27, 
                 33, 38, 94, 107, 124, 
                 152, 261, 206, 56, 117, 
                 167, 175, 221, 136, 35, 
                 231]

# Select countries and data from 2007
current_ac = current_ac.iloc[country_index,[0,8]]

GDP

Data source: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
Downloaded: 18th March 2020

In [8]:
# Load GDP data
gdp = pd.read_csv('gdp.csv', 
                  skiprows = 4, 
                  usecols = [0] + list(range(44,63)))

# Select countries and data from 2007
gdp = gdp.iloc[country_index, [0,8]]

Reserves

Data source: https://data.worldbank.org/indicator/FI.RES.XGLD.CD
Downloaded: 19th March 2020

In [9]:
# Load FX reserves data
res = pd.read_csv("fx_reserves.csv", 
                  skiprows = 3, 
                  usecols = [0] + list(range(44,63)))

# Select countries and data from 2007
res = res.iloc[country_index, [0,8]]

Ratio of reserves to GDP

In [10]:
# Divide reserves data frame by GDP data frame
res_ratio = res.iloc[:,1:].div(gdp.iloc[:,1:])

# Insert country name column
res_ratio.insert(value = res['Country Name'], 
                 loc = 0, 
                 column = 'Country')

Percentage change in exchange rate

2008-07-01 to 2009-01-30
In [11]:
# Select exhange rates from 2008-07-01 and 2009-01-30
percent_change = fx.iloc[[2136,2281], 1:].copy()

# Calculate the percentage change between the two dates for each currency
percent_change = percent_change.pct_change().iloc[1:, list(range(0,20)) + [21]]

# Add columns with country names, reserves/GDP and current account
percent_change = percent_change.T
percent_change['Country'] = list(current_ac['Country Name'])
percent_change['res_GDP'] = list(res_ratio['2007'])
percent_change['curr_ac'] = list(current_ac['2007'])

# Rename columns
percent_change.columns = ['per_ch', 'Country', 'res_GDP', 'curr_ac']

Relationship between FX reserves to GDP and % change in exchange rate (2008-07-01 to 2009-01-30)

In [12]:
sns.regplot(x = percent_change['res_GDP'], y = percent_change['per_ch'])
plt.xlabel('Ratio of FX reserves to GDP');
plt.ylabel('% change in exchange rate');

Relationship between current account and % change in exchange rate (2008-07-01 to 2009-01-30)

In [13]:
sns.regplot(x = percent_change['curr_ac'], y = percent_change['per_ch'])
plt.xlabel('Current account %');
plt.ylabel('% change in exchange rate');

Hypothesis 2

Macro metrics including GDP growth, current account position, FX reserves, interest rates, liabilities and assests in other countries influenced whether the USD appreciated against their currencies and the magnitude of that appreciation.
  • Exchange rates of other coutries to explore appreciation of USD during the crisis ( 1 July 2008 to 31 July 2009 )
  • Regression of exchange rates on different economic determinants in a pre-crisis period ( 2004 - 2006 )
In [14]:
country_index = [11, 27, 33, 35, 38, 
                 56, 66, 79, 94, 107, 
                 117, 124, 136, 152, 167, 
                 175, 178, 206, 221, 261]

country_list = ['Australia', 'Brazil', 'Canada', 'China', 'Denmark',
                'Euro area', 'Hong Kong', 'India', 'Japan', 'Korea, Rep.',
                'Malaysia', 'Mexico', 'New Zealand', 'Norway', 'Singapore',
                'South Africa', 'Sri Lanka', 'Sweden', 'Switzerland', 'United Kingdom']

curr_code = ['AUD', 'EUR', 'NZD', 'GBP', 'BRL',
            'CAD', 'CNY', 'HKD', 'INR', 'KRW',
            'MXN', 'ZAR', 'SGD', 'DKK', 'JPY',
            'MYR', 'NOK', 'SEK', 'LKR', 'CHF',
            'TWD', 'THB']

FX rates

In [15]:
## Read in data
fx = pd.read_csv("FX_rates.csv", 
                 usecols = list(range(1,24)))

## Replace missing data with NaN
fx = fx.replace('ND', np.nan)

## Convert dates into date format
fx['Time Serie'] = pd.to_datetime(fx['Time Serie'],
                                  format = '%Y-%m-%d')

## Create clean dataset with no missing data
fx = fx.dropna().copy()

## Convert string numbers into floats
fx.iloc[:,1:23] = fx.iloc[:,1:23].apply(pd.to_numeric)

# List of exchange rate 
fx_names = list(fx.iloc[:,1:].columns)

Percentage changes

Percentage change in exchange rate from 1 July 2008 to 31 January 2009 (define as crisis period)

In [16]:
percent_change = fx.iloc[[2136,2281],list(range(1,21))].copy()

percent_change = percent_change.pct_change().iloc[1]

per_sort = percent_change.sort_index(ascending = True)
In [17]:
plt.bar(x = country_list, height = per_sort)
plt.xticks(rotation = 90);
plt.ylabel('Appreciation (0.5 = 50% increase)');

Macro fundamentals

  • Current account (% of GDP)
  • GDP
  • GDP per capita growth
  • Foreign exchange reserves (% of GDP)
  • Real interest rates
  • Inflation

External exposure

  • Portfolio investment assets (CPIS)
  • Portfolio investment liabilities (CPIS)

Current account Balance (% of GDP)

In [18]:
current_ac = pd.read_csv('Current_account.csv', 
                         skiprows = 4, 
                         usecols = [0] + list(range(44,63)))

current_ac = current_ac.iloc[country_index,:].copy()

current_ac = current_ac.sort_values('Country Name').copy()

current_ac['per_ch'] = list(per_sort)



#Adding in column for average of Current_ac from Years 2004-2006
ca_avg = current_ac.iloc[:,5:8]
current_ac['ca_avg'] = ca_avg.mean(axis=1)

# Removing all irrelevant columns for regression 
current_ac.drop(current_ac.columns[1:-2], axis =1, inplace = True)
current_ac = current_ac.sort_values(by='Country Name').copy()
current_ac = current_ac.rename({'Country Name': 'Country'}, axis=1)

# Resetting index of database (arranged alphabetically)
current_ac.reset_index(drop = True, inplace = True)

GDP

In [19]:
gdp = pd.read_csv('gdp.csv', 
                  skiprows = 4, 
                  usecols = [0] + list(range(44,63)))

gdp = gdp.iloc[country_index,:].copy()
gdp = gdp.rename({'Country Name': 'Country'}, axis=1)  

# Adding in column for avg GDP from years 2004-2006
gdp_avg = gdp.iloc[:, 5:8]
gdp['gdp_avg'] = gdp_avg.mean(axis=1)
gdp = gdp.sort_values(by='Country').copy()

#Removing all irrelevant columns for regression in a new df called gdp_new
gdp_new = gdp.drop(gdp.columns[1:5], axis =1).copy()
gdp_2 = gdp_new.drop(gdp_new.columns[4:-1], axis =1).copy()
gdp_avg = gdp.drop(gdp.columns[1:-1], axis =1).copy()

# Resetting index of database (arranged alphabetically)
gdp.reset_index(drop = True, inplace = True)

GDP per capita growth (annual %)

In [20]:
gdp_growth = pd.read_csv("gdp_growth.csv", 
                  skiprows = 3, 
                  usecols = [0] + list(range(44,63)))

gdp_growth = gdp_growth.iloc[country_index,:].copy()
gdp_growth = gdp_growth.rename({'Country Name': 'Country'}, axis=1)

#Adding in average GDP growth for years 2004 - 2006 
gdp_growth_avg = gdp_growth.iloc[:, 5:8]
gdp_growth['gdp_growth_avg'] = gdp_growth_avg.mean(axis=1)

# Removing irrelevant columns for regression 
gdp_growth = gdp_growth.drop(gdp_growth.columns[1:-1], axis =1).copy()
gdp_growth = gdp_growth.sort_values(by='Country').copy()
gdp_growth.reset_index(drop = True, inplace = True)

FX Reserves

In [21]:
res = pd.read_csv("fx_reserves.csv", 
                  skiprows = 3, 
                  usecols = [0] + list(range(44,63)))

res = res.iloc[country_index,:].copy()

res = res.sort_values(by='Country Name').copy()
res.reset_index(drop = True, inplace = True)

# Scale res by GDP to produce ratio 

ratio = res.iloc[:,1:].div(gdp.iloc[:,1:-1])


ratio.insert(value = res['Country Name'], 
             loc = 0, 
             column = 'Country')


# Insert average of ratio from Years 2004-2006 
ratio_avg = ratio.iloc[:, 5:8]
ratio['res_avg'] = ratio_avg.mean(axis=1) 

#Remove irrelevant columns for regression 
ratio = ratio.drop(ratio.columns[1:-1], axis =1).copy()

Real Interest Rates

In [22]:
ir = pd.read_csv("interest_rate.csv", 
                  skiprows = 4,
                  usecols = [0] + list(range(44,63)))
ir = ir.iloc[country_index,:].copy()
ir = ir.rename({'Country Name': 'Country'}, axis=1) 

# Insert average of ratio from Years 2004-2006 
ir_avg = ir.iloc[:, 5:8]
ir['ir_avg'] = ir_avg.mean(axis=1) 

#missing data for Denmark, Euro, Switzerland, Sri Lanka and Norway (get from other data sources)

#Remove irrelevant columns for regression 
ir = ir.drop(ir.columns[1:-1], axis =1).copy()
ir = ir.sort_values(by='Country').copy()
ir.reset_index(drop = True, inplace = True)

# Inserting missing real interest rate values 
# Missing values obtained from external sources 
ir.iloc[18,1] = 2.057
ir.iloc[4,1] = 0.770833333333
ir.iloc[13,1] = 0.99444444444
ir.iloc[5,1] = 0.152086111111

Inflation, consumer prices ( annual %)

In [23]:
inf = pd.read_csv("inflation.csv", 
                  skiprows = 4,
                  usecols = [0] + list(range(44,63)))

inf = inf.iloc[country_index,:].copy()
inf = inf.rename({'Country Name': 'Country'}, axis=1) 

# Insert average of ratio from Years 2004-2006 
inf_avg = inf.iloc[:, 5:8]
inf['inf_avg'] = inf_avg.mean(axis=1) 

#Remove irrelevant columns for regression 
inf = inf.drop(inf.columns[1:-1], axis =1).copy()

inf = inf.sort_values(by='Country').copy()
inf.reset_index(drop = True, inplace = True)

Portfolio Investment Assets

In [24]:
assets = pd.read_csv("CPIS_assets.csv", 
                     skiprows = 4,
                     usecols = [1,5,6,7])
assets.rename({'Unnamed: 1': 'Country', 'DEC. 2004' : '2004', 'DEC. 2005' : '2005', 'DEC. 2006' : '2006'}, axis=1, inplace = True)
assets = assets.sort_values(by='Country').copy()
assets.iloc[:,1:] = assets.iloc[:,1:].apply(pd.to_numeric)

assets.reset_index(drop = True, inplace = True)

# Divide assets by GDP 
assets_new = assets.iloc[:,1:].div(gdp.iloc[:,5:8])

# Find the average of assets from Year 2004-2006 
assets['assets_avg'] = assets_new.mean(axis=1) 

#Remove irrelevant columns for regression 
assets = assets.drop(assets.columns[1:-1], axis =1).copy()

Portfolio Investment Liabilities

In [25]:
liab = pd.read_csv("CPIS_liab2.csv", 
                    skiprows = 4,
                    usecols = [1,5,6,7])

liab.sort_values(by='Country').copy()
liab = liab.rename({'Unnamed: 0': 'Country', 'DEC. 2004' : '2004', 'DEC. 2005' : '2005', 'DEC. 2006' : '2006'}, axis=1)


liab.reset_index(drop = True, inplace = True)
liab.iloc[:,1:] = liab.iloc[:,1:].apply(pd.to_numeric)

# Divide liabilities by GDP 
liab_new = liab.iloc[:,1:].div(gdp.iloc[:,5:8])

# Find the average of assets from Year 2004-2006 
liab['liab_avg'] = liab_new.mean(axis=1) 

# Remove irrelevant columns for regression 
liab = liab.drop(liab.columns[1:-1], axis =1).copy()

Combining datasets for regression

In [26]:
reg = pd.merge(current_ac, gdp_avg, on = "Country", how = "outer")
reg = pd.merge(reg, ratio, on = "Country", how = "outer")
reg = pd.merge(reg, ir, on = "Country")
reg = pd.merge(reg, inf, on = "Country")
reg = pd.merge(reg, gdp_growth, on = "Country")
reg = pd.merge(reg, liab, on = "Country", how = "outer")
reg = pd.merge(reg, assets, on = "Country", how = "outer")

Regressions

Regression 1


Use all variables except for GDP ( used as a scaling factor)

In [27]:
results_reg = smf.ols('per_ch ~ gdp_growth_avg + ca_avg + res_avg + ir_avg + inf_avg + liab_avg + assets_avg', 
                      data = reg).fit()
In [28]:
print(results_reg.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 per_ch   R-squared:                       0.566
Model:                            OLS   Adj. R-squared:                  0.229
Method:                 Least Squares   F-statistic:                     1.677
Date:                Sun, 05 Apr 2020   Prob (F-statistic):              0.231
Time:                        20:18:30   Log-Likelihood:                 12.567
No. Observations:                  17   AIC:                            -9.134
Df Residuals:                       9   BIC:                            -2.468
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          0.1539      0.158      0.973      0.356      -0.204       0.512
gdp_growth_avg    -0.0257      0.038     -0.673      0.518      -0.112       0.061
ca_avg            -0.0035      0.008     -0.452      0.662      -0.021       0.014
res_avg           -0.2946      0.352     -0.836      0.425      -1.092       0.503
ir_avg            -0.0017      0.005     -0.325      0.752      -0.013       0.010
inf_avg            0.0843      0.044      1.920      0.087      -0.015       0.184
liab_avg        8.541e+05    5.8e+05      1.474      0.175   -4.57e+05    2.17e+06
assets_avg     -2.595e+05   5.02e+05     -0.517      0.618    -1.4e+06    8.76e+05
==============================================================================
Omnibus:                        4.257   Durbin-Watson:                   1.541
Prob(Omnibus):                  0.119   Jarque-Bera (JB):                2.126
Skew:                          -0.810   Prob(JB):                        0.345
Kurtosis:                       3.615   Cond. No.                     2.12e+08
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 5.16e-14. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/scipy/stats/stats.py:1534: UserWarning:

kurtosistest only valid for n>=20 ... continuing anyway, n=17

Regression 2


Remove currency peggers from regression (China)

Remove safe haven currencies (Japan, Hong Kong)

In [29]:
reg2 = reg.drop(reg.index[3]).copy()
reg2 = reg2.drop(reg.index[6]).copy()
reg2 = reg2.drop(reg.index[8]).copy()
In [30]:
results_reg2 = smf.ols('per_ch ~ gdp_growth_avg + ca_avg + res_avg + inf_avg + ir_avg  + liab_avg + assets_avg', 
                       data = reg2).fit()
print(results_reg2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 per_ch   R-squared:                       0.819
Model:                            OLS   Adj. R-squared:                  0.637
Method:                 Least Squares   F-statistic:                     4.514
Date:                Sun, 05 Apr 2020   Prob (F-statistic):             0.0324
Time:                        20:18:30   Log-Likelihood:                 21.845
No. Observations:                  15   AIC:                            -27.69
Df Residuals:                       7   BIC:                            -22.02
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          0.5467      0.113      4.847      0.002       0.280       0.813
gdp_growth_avg    -0.0562      0.022     -2.573      0.037      -0.108      -0.005
ca_avg            -0.0088      0.005     -1.960      0.091      -0.019       0.002
res_avg           -0.0172      0.201     -0.086      0.934      -0.494       0.459
inf_avg           -0.0098      0.031     -0.315      0.762      -0.083       0.064
ir_avg             0.0047      0.003      1.506      0.176      -0.003       0.012
liab_avg         5.46e+05   3.13e+05      1.743      0.125   -1.95e+05    1.29e+06
assets_avg     -5.516e+05   2.77e+05     -1.988      0.087   -1.21e+06    1.04e+05
==============================================================================
Omnibus:                        5.910   Durbin-Watson:                   1.794
Prob(Omnibus):                  0.052   Jarque-Bera (JB):                2.874
Skew:                          -0.766   Prob(JB):                        0.238
Kurtosis:                       4.500   Cond. No.                     2.11e+08
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.83e-14. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/scipy/stats/stats.py:1534: UserWarning:

kurtosistest only valid for n>=20 ... continuing anyway, n=15

Conclusion


  • Strong problems of multicollinearity --> insignificance

  • Variance Inflation Factor (remove variables that are closely correlated)

  • Partial Least Squares Regression


Hypothesis 3

Short term interest rates affect the direction and magnitude of movement in the exchange rates vis-a-vis the US dollar

This will be investigated using time series analysis.

USD/GBP rates

In [31]:
#Change column names for convenience
fx.columns = ['Date'] + curr_code

#Daily UK exchange rates, Jan 2005 to Jan 2009 inclusive
fx_uk = fx.iloc[range(1257, 2282), [0, 4]]

#Monthly average UK exchange rates, Jan 2005 to Jan 2009 inclusive
fx_uk = fx_uk.set_index('Date')
fx_uk = fx_uk.resample('M').mean()

Interest rates

The use of short-term interest rate data is motivated by two factors:

a) sources have mentioned that the unwinding of carry trades may be able to explain exchange rate movements during the global financial crisis, and

b) monthly data is available for it at https://data.oecd.org/interest/short-term-interest-rates.htm.

UK short-term interest rates

In [32]:
#Monthly real short-term interest rates, Jan 2005 to Jan 2009 inclusive
ir_uk = pd.read_csv("UK_STrates.csv", usecols = [5,6])
ir_uk = ir_uk.iloc[range(0, 49), :]

#Convert dates into pd.Datetime object 
ir_uk['TIME'] = pd.to_datetime(ir_uk['TIME'], format = '%Y-%m-%d')

Stationarity

We want the data to be stationary; in the regression context the stationarity is important since the same results which apply for independent data holds if the data is stationary.

The mean and variance of a stationary time series are constant over time. Here is a comparison between a stationary and non-stationary series: Image source: https://upload.wikimedia.org/wikipedia/commons/e/e1/Stationarycomparison.png

Spurious correlation

A variable with a time trend is highly likely to be correlated with another variable that has a time trend. For example, a person A's weight from birth to age 20 is going to be highly correlated with person B's weight from birth to age 20. This is true for the vast majority of any two people randomly selected from the global population. But obviously, person A's weight does not 'cause' and is not 'caused by' person B's weight.

Making a series stationary helps to reduce the possibility of detecting such a spurious correlation.

Making a time series stationary

Two common ways of achieving this are:

1) Taking the logarithm of the series and

2) taking the first difference of the series (difference of consecutive values).

In [33]:
plt.rcParams.update({'figure.figsize':(9,5), 'figure.dpi':120})

#plt.subplot(x, y, z) means a plot with 3 rows, 1 column, where z is the row number of the subplot
plt.subplot(3, 1, 1)
plt.plot(fx_uk.index, fx_uk['GBP'])
plt.ylabel('GBP')
#Removes xticks to improve readability
plt.xticks([])
plt.title('USD/GBP')

plt.subplot(3, 1, 2)
plt.plot(fx_uk.index, np.log(fx_uk['GBP']))
#Removes xticks, since the bottom subplot has xticks
plt.xticks([])
plt.title('log(USD/GBP)')

plt.subplot(3, 1, 3)
plt.plot(fx_uk.index, fx_uk['GBP'].diff())
plt.xlabel('Date')
plt.title('Differenced USD/GBP')

plt.show()

Differencing makes the series noticeably more stationary. There are more quantitative methods to measure this, which is a possible extension. Arguably, the mean and/or variance increase towards the end of the period, so it may be worth restricting the period considered.

In [34]:
#plt.subplot(x, y, z) means a plot with 3 rows, 1 column, where z is the row number of the subplot
plt.subplot(3, 1, 1)
plt.plot(fx_uk.index, ir_uk['Value'])
plt.ylabel('%')
#Removes xticks to improve readability. The bottom subplot still has xticks
plt.xticks([])
plt.title('UK Short-term interest rates')

plt.subplot(3, 1, 2)
plt.plot(fx_uk.index, np.log(ir_uk['Value']))
plt.xticks([])
plt.title('log(UK Short-term interest rates)')

plt.subplot(3, 1, 3)
plt.plot(fx_uk.index, ir_uk['Value'].diff())
plt.xlabel('Date')
plt.title('Differenced UK Short-term interest rates')

plt.show()

Differencing appears to make the series more stationary, but this is less obvious than before. Here, we can say with more confidence that the mean decreases and the variance increases towards the end of the period, so an extension would be to restrict the period considered, perhaps from 2004-2007, since this would be entirely pre-crisis.

Pearson correlation coefficient

In [35]:
print ("Pearson correlation coefficient and p-value: ",
       sp.stats.pearsonr(fx_uk['GBP'].diff()[1:], ir_uk['Value'].diff()[1:]))
Pearson correlation coefficient and p-value:  (-0.5626616026874437, 3.146858631906517e-05)

The p-value roughly indicates the probability of an uncorrelated system producing datasets that have a similar correlation coefficient. Note that the accuracy of the p-value is reduced by the fact that our vectors only have 49 data points each. This is a consequence of having only monthly data, and the fact that it is only a short pre-crisis period we are interested in.

Linear Regression

This small p-value suggests that there may be a causal relationship to be found between the variables, as we hypothesised. We can fit a regression model to test aspects of this.

In [36]:
#Hypothesised explanatory variable; differenced interest rate
#Reshaped since this is the format expected by the model
x = np.array(ir_uk['Value'].diff()[1:])
x_reshaped = np.array(ir_uk['Value'].diff()[1:]).reshape((-1, 1))

#Hypothesised response variable; differenced exchange rate
y = np.array(fx_uk['GBP'].diff()[1:])

model = LinearRegression().fit(x_reshaped, y)
In [37]:
print ("R**2: ", model.score(x_reshaped, y))
R**2:  0.31658807913880294

Around 30% of the variation in the USD/GBP exchange rate can be explained by UK short-term interest rates according to this model.

Residual plot

One way to quickly assess the adequacy of a linear model is by plotting the residuals, i.e. the difference between the actual and predicted y-values:

In [38]:
plt.plot(fx_uk['GBP'].diff()[1:] - model.predict(x_reshaped))
plt.xticks([])
plt.title('Residuals, UK model')
Out[38]:
Text(0.5, 1.0, 'Residuals, UK model')

There is no clear pattern among the residuals, which suggests that the model is not inadequate.

Granger causality test

This is used to determine if one time series will be useful to forecast another.

If x causes y, the forecast of y based on previous values of y and previous values of x should outperform the forecast of y based on previous values of y alone.

Null hypothesis: The series in the second column does not Granger cause the series in the first.

In [39]:
inp = pd.DataFrame({'x': x, 'y': y})
grangercausalitytests(inp, maxlag=2)
#maxlag specifies how much to lag x and y by 
Granger Causality
number of lags (no zero) 1
ssr based F test:         F=14.8465 , p=0.0004  , df_denom=44, df_num=1
ssr based chi2 test:   chi2=15.8588 , p=0.0001  , df=1
likelihood ratio test: chi2=13.6649 , p=0.0002  , df=1
parameter F test:         F=14.8465 , p=0.0004  , df_denom=44, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=9.4589  , p=0.0004  , df_denom=41, df_num=2
ssr based chi2 test:   chi2=21.2248 , p=0.0000  , df=2
likelihood ratio test: chi2=17.4525 , p=0.0002  , df=2
parameter F test:         F=9.4589  , p=0.0004  , df_denom=41, df_num=2
Out[39]:
{1: ({'ssr_ftest': (14.84651566769291, 0.0003752297812108874, 44.0, 1),
   'ssr_chi2test': (15.858778099581064, 6.824921510592009e-05, 1),
   'lrtest': (13.664920578370172, 0.00021849818112427115, 1),
   'params_ftest': (14.846515667692914, 0.0003752297812108874, 44.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x11942fdf0>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x11942ff40>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (9.458896197844197, 0.00041896243307393353, 41.0, 2),
   'ssr_chi2test': (21.224840248821124, 2.4608463624758988e-05, 2),
   'lrtest': (17.452465747679867, 0.00016227260763054552, 2),
   'params_ftest': (9.458896197844194, 0.00041896243307393353, 41.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x11942fa90>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x11942f9d0>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])])}

Looking at the p-values, the results are significant to a level below 0.01%.

Repeating for USD/JPY and Japan short-term interest rates

In [40]:
#Daily USD/JPY exchange rate, Jan 2005 to Jan 2009 inclusive
fx_jpn = fx.iloc[range(1257, 2282), [0, 15]]

#Monthly average USD/JPY exchange rate, Jan 2005 to Jan 2009 inclusive
fx_jpn = fx_jpn.set_index('Date')
fx_jpn = fx_jpn.resample('M').mean()

#Monthly real short-term interest rates, Jan 2005 to Jan 2009 inclusive
ir_jpn = pd.read_csv("JPN_STrates.csv", usecols = [5,6])
ir_jpn = ir_jpn.iloc[range(0, 49), :]

#Convert dates into pd.Datetime object 
ir_jpn['TIME'] = pd.to_datetime(ir_jpn['TIME'], format = '%Y-%m-%d')

print ('Pearson correlation coefficient and p-value: ', 
       sp.stats.pearsonr(fx_jpn['JPY'].diff()[1:], ir_jpn['Value'].diff()[1:])) 
Pearson correlation coefficient and p-value:  (0.04405162290190072, 0.7662390191136474)

Results are very different for the Japanese Yen! But this does not cause us to reject our hypothesis, since it is a 'safe-haven' currency.

In [41]:
#Hypothesised explanatory variable; differenced interest rate
#Reshaped since this is the format expected by the model
x_ = np.array(ir_jpn['Value'].diff()[1:])
x_reshaped_ = np.array(ir_jpn['Value'].diff()[1:]).reshape((-1, 1))

#Hypothesised response variable; differenced exchange rate
y_ = np.array(fx_jpn['JPY'].diff()[1:])

model = LinearRegression().fit(x_reshaped_, y_)

#R**2 score 
print ("R**2: ", model.score(x_reshaped_, y_))
R**2:  0.0019405454802911004
In [42]:
plt.plot(fx_jpn['JPY'].diff()[1:] - model.predict(x_reshaped_))
plt.xticks([])
plt.title('Residuals')
Out[42]:
Text(0.5, 1.0, 'Residuals')
In [43]:
inp_ = pd.DataFrame({'x': x_, 'y': y_})
grangercausalitytests(inp_, maxlag=2)
Granger Causality
number of lags (no zero) 1
ssr based F test:         F=2.2577  , p=0.1401  , df_denom=44, df_num=1
ssr based chi2 test:   chi2=2.4116  , p=0.1204  , df=1
likelihood ratio test: chi2=2.3518  , p=0.1251  , df=1
parameter F test:         F=2.2577  , p=0.1401  , df_denom=44, df_num=1

Granger Causality
number of lags (no zero) 2
ssr based F test:         F=1.8018  , p=0.1778  , df_denom=41, df_num=2
ssr based chi2 test:   chi2=4.0430  , p=0.1325  , df=2
likelihood ratio test: chi2=3.8751  , p=0.1441  , df=2
parameter F test:         F=1.8018  , p=0.1778  , df_denom=41, df_num=2
Out[43]:
{1: ({'ssr_ftest': (2.257679962634347, 0.14009682241702562, 44.0, 1),
   'ssr_chi2test': (2.4116126873594164, 0.12043823538912794, 1),
   'lrtest': (2.3517798665473038, 0.12514005476164972, 1),
   'params_ftest': (2.25767996263436, 0.14009682241702498, 44.0, 1.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x10cb874f0>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x10cb876d0>,
   array([[0., 1., 0.]])]),
 2: ({'ssr_ftest': (1.8017834952390048, 0.17782484802101042, 41.0, 2),
   'ssr_chi2test': (4.043026379560693, 0.1324548839829618, 2),
   'lrtest': (3.8751212584788277, 0.14405492489615154, 2),
   'params_ftest': (1.8017834952390213, 0.17782484802100734, 41.0, 2.0)},
  [<statsmodels.regression.linear_model.RegressionResultsWrapper at 0x11942fac0>,
   <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x11942f400>,
   array([[0., 0., 1., 0., 0.],
          [0., 0., 0., 1., 0.]])])}

Further exploration

Matthew

  • Use datasets with more currencies
  • Improve / experiment with different methods of calculating the percentage change in exhange rates
    • Perhaps use max change in that time period instead of start / end point
  • Do the same with the methods of assessing other metrics.
    • Perhaps an average over the pre-crisis period would be better than only using 2007

Mun Fai

  • Explore the use of variance inflation factor to see which variables are mostly correlated
  • Partial least squares regression to solve the problem of multicolinearity in regression

Sanjit

  • Find other reasons why the correlation may be spurious.
  • Restrict the period considered so that the series used are more stationary.
  • Check the validity of the analysis with a teacher.
  • Repeat the analysis for every currency in the FX rates dataset and see if any patterns emerge
    • e.g. we would hope that CHF is similar to JPY, AUD similar to GBP etc.
  • Explore granger causality
    • Does out data satisfy assumptions
    • What are the other limitations of the model?